Before moving forward with the to-do list, let’s throw a Random Forest to it.
Gradient boost
For many reasons, Random Forest is usually a very good baseline model. In this particular case I started with the polynomial OLS as baseline model, just because it was so evident from the correlations that the relationship between temperature and consumption follows a polynomial shape. But let’s go back to a beloved RF.
Model Cards provide a framework for transparent, responsible reporting.
Use the vetiver `.qmd` Quarto template as a place to start,
with vetiver.model_card()
Writing pin:
Name: 'wd-gb'
Version: 20241220T100645Z-e0a91
<vetiver.vetiver_model.VetiverModel at 0x7ff0102b16f0>
Metrics
|
train |
test |
test |
train |
| MAE - Mean Absolute Error |
1.341786 |
1.960490 |
1.922536 |
1.264594 |
| MSE - Mean Squared Error |
3.453180 |
14.405643 |
9.411544 |
2.933506 |
| RMSE - Root Mean Squared Error |
1.858273 |
3.795477 |
2.727256 |
1.712577 |
| R2 - Coefficient of Determination |
0.963603 |
0.804279 |
-1.129475 |
0.970024 |
| MAPE - Mean Absolute Percentage Error |
0.125605 |
0.193085 |
0.312064 |
0.104361 |
| EVS - Explained Variance Score |
0.963603 |
0.814589 |
-0.267563 |
0.970024 |
| MeAE - Median Absolute Error |
0.992591 |
1.387896 |
1.349183 |
0.969084 |
| D2 - D2 Absolute Error Score |
0.812036 |
0.674332 |
-0.307344 |
0.819926 |
| Pinball - Mean Pinball Loss |
0.670893 |
0.980245 |
0.961268 |
0.632297 |
Observed vs. Predicted and Residuals vs. Predicted
Check for …
check the residuals to assess the goodness of fit.
- white noise or is there a pattern?
- heteroscedasticity?
- non-linearity?
Normality of Residuals:
Check for …
- Are residuals normally distributed?
Residuals Autocorrelation Plot
Residuals vs Time
Again, overfits a lot.
Parameter: param_model__learning_rate
Parameter: param_model__max_depth
Parameter: param_model__min_samples_leaf
Parameter: param_model__min_samples_split
Parameter: param_model__n_estimators
Parameter: param_model__subsample
Parameter: param_vars__columns
Best model
{'model__learning_rate': 0.1,
'model__max_depth': 5,
'model__min_samples_leaf': 5,
'model__min_samples_split': 48,
'model__n_estimators': 60,
'model__subsample': 1,
'vars__columns': ['rf_tu_mean', 'vp_std_mean']}
Pipeline(steps=[('vars', ColumnSelector(columns=['rf_tu_mean', 'vp_std_mean'])),
('model',
GradientBoostingRegressor(max_depth=5, min_samples_leaf=5,
min_samples_split=48,
n_estimators=60, random_state=7,
subsample=1))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Metrics
|
train |
test |
test |
train |
| MAE - Mean Absolute Error |
1.510140 |
1.945387 |
2.101590 |
1.535597 |
| MSE - Mean Squared Error |
4.979002 |
15.336298 |
7.821989 |
5.067173 |
| RMSE - Root Mean Squared Error |
2.231368 |
3.916159 |
2.655873 |
2.248227 |
| R2 - Coefficient of Determination |
0.947521 |
0.791635 |
-1.358771 |
0.948123 |
| MAPE - Mean Absolute Percentage Error |
0.135645 |
0.191204 |
0.359824 |
0.117272 |
| EVS - Explained Variance Score |
0.947521 |
0.803662 |
-0.176475 |
0.948123 |
| MeAE - Median Absolute Error |
1.026574 |
1.164071 |
1.776121 |
1.049377 |
| D2 - D2 Absolute Error Score |
0.788452 |
0.676841 |
-0.473825 |
0.781166 |
| Pinball - Mean Pinball Loss |
0.755070 |
0.972693 |
1.050795 |
0.767799 |
Observed vs. Predicted and Residuals vs. Predicted
Check for …
check the residuals to assess the goodness of fit.
- white noise or is there a pattern?
- heteroscedasticity?
- non-linearity?
Normality of Residuals:
Check for …
- Are residuals normally distributed?
Residuals Autocorrelation Plot
Compare vanilla vs. tuned
Metrics
Single split
Metrics based on the test set of the single split
Predictions, residuals, observed
next
Time vs. Predicted and Observed
Model details
Pipeline(steps=[('vars',
ColumnSelector(columns=['tt_tu_mean', 'rf_tu_mean', 'td_mean',
'vp_std_mean', 'tf_std_mean'])),
('model', GradientBoostingRegressor(random_state=7))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('vars', ColumnSelector(columns=['rf_tu_mean', 'vp_std_mean'])),
('model',
GradientBoostingRegressor(max_depth=5, min_samples_leaf=5,
min_samples_split=48,
n_estimators=60, random_state=7,
subsample=1))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.